1 Preface

1.1 Objectives

This project aims to provide insight into the conditions of the property market in Jabodetabek. The data used are the results of a quick sample of ± 10,000 houses offered on one of the marketplace sites in Indonesia in September 2020. The values shown do not necessarily represent the actual prices. This project is modified from Geospatial Analysis in R workshop by Algoritma Data Science School.

The final result of this project is a simple Jabodetabek housing dashboard which can be seen here. It looks like this:

Snapshot of the final dashboard

Figure 1.1: Snapshot of the final dashboard

1.2 Library and Setup

In this Library and Setup section you’ll see some code to initialize our workspace, and the packages we’ll be using for this project.

Packages are collections of R functions, data, and compiled code in a well-defined format. The directory where packages are stored is called the library. R comes with a standard set of packages. Others are available for download and installation. Once installed, they have to be loaded into the session to be used.

You will need to use install.packages() to install any packages that are not yet downloaded onto your machine. To install packages, type the command below on your console then press ENTER.

## DO NOT RUN CHUNK
# packages <- c("rgdal","sf","tidyverse","glue", "plotly", "maps","leaflet","leaflet.extras", "tmap", "flexdashboard","DT")
# 
# install.packages(packages)

Then you need to load the package into your workspace using the library() function. Special for this course, the rmarkdown packages do not need to be called using library().

# package for data wrangling & vis
library(tidyverse)
library(glue)
library(scales)

# package for spatial environment in R
library(sf)

# package for visualization
library(leaflet)
library(plotly)

2 Jakarta House Pricing Distribution

2.1 Data Preprocessing

Before starting the analysis, first, we have to read the data:

# Read data
df <- read.csv("data/listings.csv")
head(df)
#>   id          kota     kecamatan       harga KT KM  m2
#> 1  1 Jakarta Pusat        Gambir  3900000000  3  2 169
#> 2  2 Jakarta Pusat     Kemayoran  1450000000  3  2 160
#> 3  3 Jakarta Pusat        Gambir 11100000000 10  8 720
#> 4  4 Jakarta Pusat     Kemayoran   485000000  2  1  35
#> 5  5 Jakarta Utara Tanjung Priok  5600000000  6  5 240
#> 6  6 Jakarta Barat     Tamansari  3000000000 10  5 180

We want to compare the house pricing for each sub-district (Kecamatan) level. It’s better use the price per square meter since the size of the listed properties may vary:

# Make a dataframe that shows the house pricing for each sub-district (`Kecamatan`) level
df_agg <- df %>% 
  mutate(
    harga_m2 = harga / m2
  ) %>% 
  group_by(kota, kecamatan) %>% 
  summarise(harga_m2 = median(harga_m2),
            total_listings= n()) %>% 
  ungroup()

head(df_agg)
#> # A tibble: 6 x 4
#>   kota          kecamatan         harga_m2 total_listings
#>   <chr>         <chr>                <dbl>          <int>
#> 1 Bekasi        Tarumajaya       12281250             463
#> 2 Depok         Beji             11746032.             84
#> 3 Depok         Cimanggis        11679365.            110
#> 4 Depok         Cinere           12761905.             88
#> 5 Jakarta Barat Cengkareng       13040000             455
#> 6 Jakarta Barat Grogolpetamburan 15032680.            223

In order to make the data can be displayed in map, we need spatial data. This time I used shapefile of Indonesia provided by GADM. Indonesia’s spatial vector is provided up to 4 levels of granularity. In this case, we use a level of spatial vector that contains Kecamatan.

Let’s read the spatial data, then display it:

# Read the spatial data
idn <- st_read(dsn = "shp", layer = "idn")
#> Reading layer `idn' from data source `D:\Learn Data\AA Data Competition\Portofolio R Geospatial\rgeo-intro-master\shp' using driver `ESRI Shapefile'
#> Simple feature collection with 6694 features and 16 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 95.0097 ymin: -11.00762 xmax: 141.0194 ymax: 6.076941
#> geographic CRS: WGS 84
plot(idn$geometry)

Now, we have to join the to the spatial data. This process aims to project our housing data with the geographical information.

idn %>% 
  as.data.frame() %>% 
  head()
#>   GID_0    NAME_0   GID_1 NAME_1 NL_NAME_1     GID_2     NAME_2 NL_NAME_2
#> 1   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 2   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 3   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 4   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 5   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#> 6   IDN Indonesia IDN.1_1   Aceh      <NA> IDN.1.2_1 Aceh Barat      <NA>
#>         GID_3           NAME_3 VARNAME_3 NL_NAME_3    TYPE_3    ENGTYPE_3
#> 1 IDN.1.2.1_1 Arongan Lambalek      <NA>      <NA> Kecamatan Sub-district
#> 2 IDN.1.2.2_1            Bubon      <NA>      <NA> Kecamatan Sub-district
#> 3 IDN.1.2.3_1   Johan Pahlawan      <NA>      <NA> Kecamatan Sub-district
#> 4 IDN.1.2.4_1        Kaway Xvi      <NA>      <NA> Kecamatan Sub-district
#> 5 IDN.1.2.5_1         Meureubo      <NA>      <NA> Kecamatan Sub-district
#> 6 IDN.1.2.6_1  Pantai Ceuremen      <NA>      <NA> Kecamatan Sub-district
#>      CC_3 HASC_3                       geometry
#> 1 1107062   <NA> MULTIPOLYGON (((95.97953 4....
#> 2 1107061   <NA> MULTIPOLYGON (((96.16601 4....
#> 3 1107050   <NA> MULTIPOLYGON (((96.13205 4....
#> 4 1107080   <NA> MULTIPOLYGON (((96.16397 4....
#> 5 1107081   <NA> MULTIPOLYGON (((96.25119 4....
#> 6 1107082   <NA> MULTIPOLYGON (((96.18817 4....

We can do this by joining the information of the city (kota) & sub-district (kecamatan) to NAME_2 & NAME_3 variables respectively in idn. By doing left join with df_agg as left_table the result table will only include the Jabodetabek area (same as df_agg)

# Join data
df_agg <- df_agg %>% 
  left_join(idn, by = c("kota" = "NAME_2", "kecamatan" = "NAME_3"))

head(df_agg)
#> # A tibble: 6 x 19
#>   kota  kecamatan harga_m2 total_listings GID_0 NAME_0 GID_1 NAME_1 NL_NAME_1
#>   <chr> <chr>        <dbl>          <int> <chr> <chr>  <chr> <chr>  <chr>    
#> 1 Beka~ Tarumaja~   1.23e7            463 IDN   Indon~ IDN.~ Jawa ~ <NA>     
#> 2 Depok Beji        1.17e7             84 IDN   Indon~ IDN.~ Jawa ~ <NA>     
#> 3 Depok Cimanggis   1.17e7            110 IDN   Indon~ IDN.~ Jawa ~ <NA>     
#> 4 Depok Cinere      1.28e7             88 IDN   Indon~ IDN.~ Jawa ~ <NA>     
#> 5 Jaka~ Cengkare~   1.30e7            455 IDN   Indon~ IDN.~ Jakar~ <NA>     
#> 6 Jaka~ Grogolpe~   1.50e7            223 IDN   Indon~ IDN.~ Jakar~ <NA>     
#> # ... with 10 more variables: GID_2 <chr>, NL_NAME_2 <chr>, GID_3 <chr>,
#> #   VARNAME_3 <chr>, NL_NAME_3 <chr>, TYPE_3 <chr>, ENGTYPE_3 <chr>,
#> #   CC_3 <chr>, HASC_3 <chr>, geometry <MULTIPOLYGON [°]>

Now that our df_agg data have the geographic attributes attached, we can turn it into an sf object using st_as_sf() function:

# Modify the data type
df_agg <- st_as_sf(df_agg)

df_agg$geometry
#> Geometry set for 59 features 
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: 106.6325 ymin: -6.39886 xmax: 107.0317 ymax: -6.0409
#> geographic CRS: WGS 84
#> First 5 geometries:

2.2 Building Maps

2.2.1 Distribution of House Price/m2 for each Kecamatan

Now, let’s plot the data to the map using leaflet packages.

# Leaflet
leaflet(df_agg) %>% # create map widget
  addProviderTiles(providers$CartoDB.Positron) %>%  # add basemap
  addPolygons() # add polygons from `sf` data %

Now color the polygons based on the house pricing for each kecamatan. First, we need to create a color palette to represent the data. In the chunk below, we create an object called pal which stores the colors generated from df_agg$harga_m2 values. The value passed to fill in palette is provided by ColorBrewer2 palettes.

pal <- colorNumeric(palette = "YlOrRd", domain = df_agg$harga_m2)

leaflet(df_agg) %>% 
   addProviderTiles(providers$CartoDB.Positron) %>% 
  addPolygons(
    fillColor = ~pal(harga_m2),
    fillOpacity = .8,
    weight = 2,
    color = "white"
  )

To make our map more interactive we can add feature like highlight and labels, so as the mouse passes over them, we can see the hover and the details about the sub district (kecamatan).

# Make labels variable for parameter in leaflet
labels <- glue::glue("
  <b>{df_agg$kecamatan}, {df_agg$kota}</b>:<br> {round(df_agg$harga_m2/1e6, 2)} jt/m2 <br> Total house listed: {df_agg$total_listings}"
) %>% lapply(htmltools::HTML)

We can add label and highlight argument inside the addPolygons() function.

# Add highlight and label to make map more interactive
leaflet(df_agg) %>% 
   addProviderTiles(providers$CartoDB.Positron) %>% 
  addPolygons(
    label = labels,
    fillColor = ~pal(harga_m2),
    fillOpacity = .8,
    weight = 2,
    color = "white",
    highlight = highlightOptions(
      weight = 5,
      color = "black",
      bringToFront = TRUE,
      opacity = 0.8
    )
  )

As the final step, we might also need to add a legend to give information about the colors and intervals. To add the legend, we only a layer of addLegend() function:

leaflet(df_agg) %>% 
   addProviderTiles(providers$CartoDB.Positron) %>% 
  addPolygons(
    label = labels,
    fillColor = ~pal(harga_m2),
    fillOpacity = .8,
    weight = 2,
    color = "white",
    highlight = highlightOptions(
      weight = 5,
      color = "black",
      bringToFront = TRUE,
      opacity = 0.8
    )
  ) %>% 
  addLegend(
    pal = pal,
    values = ~harga_m2,
    opacity = 1,
    title = "Harga/m2",
    position = "bottomright"
  )

To separate DKI Jakarta’s area with the other area, I create a border.

border  <- df_agg %>% 
  filter(NAME_1 == "Jakarta Raya") %>% 
  group_by(NAME_1) %>% 
  summarise() 

Now add the border to the whole code. The full code will look like this:

leaflet(df_agg) %>% 
   addProviderTiles(providers$CartoDB.Positron) %>% # using `addProviderTiles()` instead of `addTiles()`
  addPolygons(
    label = labels,
    labelOptions = labelOptions(
      style = list(
        "font-size"="13px",
        "background-color"="black",
        "color"="white"
      )
    ),
    weight = 2,
    color = "white",
    fillOpacity = .8,
    fillColor = ~pal(harga_m2),
     highlight = highlightOptions(
    weight = 5,
    color = "black",
    bringToFront = TRUE,
    sendToBack = TRUE,
    opacity = 0.8)
  ) %>% 
  addPolylines(
    data = border,
    color = "darkred",
    opacity = .8,
    weight = 2.5
  ) %>% 
  addLegend(
    pal = pal,
    values = ~harga_m2,
    opacity = 1,
    title = "Harga/m2",
    position = "bottomright"
  ) 

2.2.2 Housing Heatmap

Now, I create a geographical density map to show the density of housing in Jabodetabek. Before doing that, we have to read the data first:

# Read perum data
perum <- read.csv('data/perumahan.csv')
head(perum)
#>               perumahan            kota  latitude longitude
#> 1 Bintaro Jaya Sektor 2 Jakarta Selatan -6.278108  106.7522
#> 2        Buaran Regency   Jakarta Timur -6.232117  106.9251
#> 3    Casa Permata Hijau Jakarta Selatan -6.221427  106.7832
#> 4     Cempaka Residence Jakarta Selatan -6.291828  106.7779
#> 5       Green Lake City   Jakarta Barat -6.177477  106.7113
#> 6          Green Lontar Jakarta Selatan -6.364691  106.7983

The geographical density map is presented using a heatmap:

# Create heatmap
library(leaflet.extras)

leaflet(perum) %>% 
  addProviderTiles(providers$CartoDB.Voyager) %>%
  addCircles(
    label = ~perumahan,
    color = "red"
  ) %>% 
  addHeatmap(
    radius = 10
  )

3 Publish the Map

An effective way to communicate data visualization is by creating a dashboard. Dashboard helps to communicate information intuitively and essential to support data-driven decision making. The dashboard made using Flexdashboard package

Here you can access simple Jabodetabek housing dashboard I created: Jabodetabek Property Analysis Dashboard

I add a Average Price per Kecamatan Table to the dashboard. The complete .Rmd for the dashboard look like this:

---
title: "Jabodetabek Property Analysis Dashboard"
output: 
  flexdashboard::flex_dashboard:
    theme: readable
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(glue)
library(scales)
library(sf)
library(plotly)
library(leaflet)
library(leaflet.extras)

# read data
idn <- st_read(dsn = "shp", layer = "idn")
df <- read.csv("data/listings.csv")
perum <- read.csv('data/perumahan.csv')

# aggregate
df_agg <- df %>% 
  mutate(
    harga_m2 = harga / m2
  ) %>% 
  group_by(kota, kecamatan) %>% 
  summarise(harga_m2 = median(harga_m2),
            total_listings= n()) %>% 
  ungroup() %>% 
  left_join(idn, by = c("kota" = "NAME_2", "kecamatan" = "NAME_3")) %>% 
  st_as_sf()

```


Column {.tabset}
-------------------------------------

### Distribution of House Price/m2 for each Kecamatan

```{r}
pal <- colorNumeric(palette = "YlOrRd", domain = df_agg$harga_m2)

labels <- glue::glue("
  <b>{df_agg$kecamatan}, {df_agg$kota}</b>:<br> {round(df_agg$harga_m2/1e6, 2)} jt/m2 <br> Total house listed: {df_agg$total_listings}") %>% lapply(htmltools::HTML)

border  <- df_agg %>% 
  filter(NAME_1 == "Jakarta Raya") %>% 
  group_by(NAME_1) %>% 
  summarise() 

leaflet(df_agg) %>% 
  addProviderTiles(providers$CartoDB.Positron) %>%  
  addPolygons(
    label = labels,
    labelOptions = labelOptions(
      style = list(
        "font-size"="13px",
        "background-color"="black",
        "color"="white"
      )
    ),
    weight = 2,
    color = "white",
    fillOpacity = .8,
    fillColor = ~pal(harga_m2),
     highlight = highlightOptions(
    weight = 5,
    color = "black",
    bringToFront = TRUE,
    sendToBack = TRUE,
    opacity = 0.8)
  ) %>% 
  addPolylines(
    data = border,
    color = "darkred",
    opacity = .8,
    weight = 3
  ) %>% 
  addLegend(
    pal = pal,
    values = ~harga_m2,
    opacity = 1,
    title = "Harga/m2",
    position = "bottomright"
  ) %>% 
  setView(lat = -6.225337, lng = 106.841086, zoom=11.2)
```



### Housing Heatmap

```{r}
library(leaflet.extras)

leaflet(perum) %>% 
  addProviderTiles(providers$CartoDB.Voyager) %>%
  addCircles(
    label = ~perumahan,
    color = "red"
  ) %>% 
  addHeatmap(
    radius = 10
  )%>% 
  setView(lat = -6.225337, lng = 106.841086, zoom=11.2)



```




Column 
-------------------------------------

### Average Price per Kecamatan Table 

```{r}
library(DT)

data <- df_agg %>% 
  as.data.frame() %>% 
  arrange(desc(harga_m2)) %>% 
  select(kota, kecamatan, harga_m2) %>% 
  mutate(harga_m2 = number(harga_m2, big.mark = ",")) %>% 
  rename(
    Kota = kota,
    Kecamatan = kecamatan,
    `Price/m2` = harga_m2
  ) 

DT::datatable(
  data,
  extensions = "Buttons",
  options = list(
    pageLength = 25,
    dom = 'Bfrtip',
    buttons = c('csv','excel','pdf')
  )
)
```